Search for: All records

Creators/Authors contains: "Sukul, Adisak"

« Prev Next »

Total Resources

1

Resource Type
Conference Paper

0

Conference Proceeding

0

Dataset

0

Journal Article

1

Workshop Report

0

Availability
Full Text / Resource Available

1

Citation Only

0

Save Results
Excel (limit 2000)
CSV (limit 5000)
XML (limit 5000)

Have feedback or suggestions for a way to improve these results?
!

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

IDC: quantitative evaluation benchmark of interpretation methods for deep text classification models

https://doi.org/10.1186/s40537-022-00583-6

Khaleel, Mohammed ; Qi, Lei ; Tavanapong, Wallapak ; Wong, Johnny ; Sukul, Adisak ; Peterson, David A. M. ( March 2022 , Journal of Big Data)

Abstract
Recent advances in deep neural networks have achieved outstanding success in natural language processing tasks. Interpretation methods that provide insight into the decision-making process of these models have received an influx of research attention because of the success and the black-box nature of the deep text classification models. Evaluation of these methods has been based on changes in classification accuracy or prediction confidence when removing important words identified by these methods. There are no measurements of the actual difference between the predicted important words and humans’ interpretation of ground truth because of the lack of interpretation ground truth. A large publicly available interpretation ground truth has the potential to advance the development of interpretation methods. Manual labeling important words for each document to create a large interpretation ground truth is very time-consuming. This paper presents (1) IDC, a new benchmark for quantitative evaluation of interpretation methods for deep text classification models, and (2) evaluation of six interpretation methods using the benchmark. The IDC benchmark consists of: (1) Three methods that generate three pseudo-interpretation ground truth datasets. (2) Three performance metrics: interpretation recall, interpretation precision, and Cohen’s kappa inter-agreement. Findings: IDC-generated interpretation ground truth agrees with human annotators on sampled movie reviews. IDC identifies Layer-wise Relevance Propagation and the gradient-by-input methods as the winning interpretation methods in this study.

more » « less